perf: Optimize settings loading with bulk JSON decoding (3.8x faster) by vilsonrodrigues · Pull Request #12 · msgflux/msgspec-ext

vilsonrodrigues · 2025-11-27T04:39:24Z

Summary

Major performance optimization refactoring BaseSettings to use msgspec.defstruct and bulk JSON decoding instead of field-by-field validation.

Performance Results

Before: 0.933ms per load (sequential Python validation)
After: 0.702ms per load (bulk C-level JSON decode)
Improvement: 33% faster vs previous implementation
vs Pydantic: 3.8x faster than pydantic-settings 🚀

Key Optimizations

Bulk JSON Decoding: All validation in C via msgspec.json.decode
Cached Encoders/Decoders: Reuse instances to eliminate instantiation overhead
Automatic Field Ordering: Required fields before optional (prevents defstruct errors)
Fixed Optional Bug: Correct Union type detection for Optional[T]

Architecture

Before (Sequential):

for field in fields:
    value = msgspec.convert(env_value, field_type)  # Python loop, slow

After (Bulk):

json_bytes = encoder.encode(all_values)  # Cached encoder
return decoder.decode(json_bytes)  # Cached decoder, all in C!

Testing

✅ 22 comprehensive unit tests covering:

Basic settings, env loading, type conversion
Optional fields, .env files, validation
Serialization methods, edge cases

✅ 5 practical examples:

Basic usage
Environment prefixes
.env files
Advanced types
Serialization

All tests pass: 22/22 ✅
Test suite: 2x faster (0.10s → 0.05s)

Bug Fixes

Optional Type Detection

# Before (broken):
if origin is type(None):  # Never true!

# After (correct):
if origin is Union:  # ✅ Correctly detects Optional[T]
    non_none = [a for a in args if a is not type(None)]
    if len(non_none) == 1:
        field_type = non_none[0]

Field Ordering

Automatically orders required before optional to prevent defstruct errors.

Changes Summary

Core optimization commits:

Initial bulk JSON decoding refactor
Encoder/decoder caching
Fixed Optional type bug
Automatic field ordering

Testing & docs:

22 comprehensive unit tests
5 practical examples with README
Updated benchmark results
Cleaned up unused code

No Breaking Changes

Same API, just faster:

class AppSettings(BaseSettings):
    name: str
    port: int = 8000

settings = AppSettings()  # Now 33% faster! ⚡

🎯 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Major performance optimization by refactoring BaseSettings to use msgspec.defstruct and bulk JSON decoding instead of field-by-field validation. ## Key Changes **Architecture:** - BaseSettings now acts as a wrapper factory using __new__ - Dynamically creates msgspec.Struct classes via defstruct - Returns native Struct instances (maintains full compatibility) **Optimization Strategy:** 1. Collect all environment variables at once 2. Preprocess string values to JSON-compatible types 3. Use msgspec.json.encode() + msgspec.json.decode() for bulk validation 4. All validation and type conversion happens in C (not Python) **Performance Improvement:** - Before: 0.933ms per settings load (sequential validation) - After: 0.685ms per settings load (bulk JSON decode) - **36% faster** (1.36x speedup) 🚀 ## Benefits - ✅ **Faster**: Bulk validation in C vs Python loops - ✅ **Compatible**: API remains unchanged (Settings() still works) - ✅ **Clean**: Leverages msgspec's native performance - ✅ **Maintainable**: Simpler code with less custom validation logic ## Implementation Details - Uses msgspec.defstruct() to create Struct classes dynamically - Injects helper methods (model_dump, model_dump_json, schema) - Caches Struct classes to avoid repeated creation - Handles type conversion (bool, int, float, JSON types) - Maintains support for env_prefix, case_sensitive, .env files ## Testing - ✅ All existing tests pass - ✅ New implementation tested with various field types - ✅ Benchmark shows 36% performance improvement This optimization maintains the familiar pydantic-like API while maximizing msgspec's performance advantages. 🎯 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Remove import json (not needed, using msgspec for everything) - Keep imports clean and minimal

## Tests (22 test cases) - ✅ Basic settings creation with defaults - ✅ Environment variable loading and type conversion - ✅ Boolean conversion (true/false/1/0/yes/no variants) - ✅ Environment prefixes (env_prefix) - ✅ .env file loading - ✅ Optional fields (str | None) - ✅ Complex types (lists, dicts from JSON env vars) - ✅ Validation errors (missing required, wrong types) - ✅ Case sensitivity handling - ✅ model_dump(), model_dump_json(), schema() methods - ✅ Struct instance verification - ✅ Class caching - ✅ Env var priority (defaults < env < explicit) ## Examples (5 practical examples) 1. **01_basic_usage.py** - Fundamentals of settings management 2. **02_env_prefix.py** - Using env_prefix for namespacing 3. **03_dotenv_file.py** - Loading from .env files 4. **04_advanced_types.py** - Complex types (Optional, lists, dicts) 5. **05_serialization.py** - Serialization and schema generation Each example includes: - Runnable code with clear output - Best practices and tips - Real-world use cases ## Test Coverage All core functionality tested: - Environment loading ✓ - Type conversion ✓ - Validation ✓ - Serialization ✓ - Edge cases ✓ All tests pass: 22/22 ✅ All examples run successfully ✅

## Performance Optimizations ### 1. Cached Encoders and Decoders - Reuse `msgspec.json.Encoder` and `msgspec.json.Decoder` instances - Avoid repeated instantiation overhead - `_encoder_cache` and `_decoder_cache` as class variables ### 2. Automatic Field Ordering - Required fields now automatically placed before optional fields - Prevents "Required field cannot follow optional fields" error - Safer and more robust struct creation ### 3. Fixed Optional Type Bug - Corrected Union type detection logic - Changed from `origin is type(None)` (never true) to `origin is Union` - Properly unwraps `Optional[T]` → `Union[T, NoneType]` → `T` - Example: `Optional[int]` now correctly detected and unwrapped ## Code Quality - Added `Union` import from typing - Improved error handling with chained exceptions - Better comments explaining the optimizations ## Testing - ✅ All 22 tests pass - ✅ Tests run 2x faster (0.10s → 0.05s) - ✅ Benchmark maintains performance: 0.702ms per load - ✅ Examples still work correctly ## Technical Details **Before (Union bug):** ```python origin = get_origin(field_type) if origin is type(None) or origin is type(int | None): # Never true! ... ``` **After (correct):** ```python origin = get_origin(field_type) if origin is Union: # Correctly detects Optional[T] args = get_args(field_type) non_none_types = [a for a in args if a is not type(None)] if len(non_none_types) == 1: field_type = non_none_types[0] ``` **Field ordering:** ```python # Before: Mixed order could cause errors fields = [(name, type, default), ...] # After: Required first, then optional required_fields = [(name, type), ...] optional_fields = [(name, type, default), ...] fields = required_fields + optional_fields ``` These optimizations make the code more robust while maintaining peak performance.

- Updated benchmark from 0.933ms to 0.702ms (33% improvement) - msgspec-ext now 3.8x faster than pydantic-settings (was 2.9x) - Added key optimizations list to performance section - Updated comparison table with new performance numbers - Changed Python version reference from 3.13 to 3.12 (actual test env)

- Remove dynaconf from benchmark comparisons (not used) - Remove unused imports (tempfile, Path) - Update docstring to reflect current comparisons - Add .benchmarks/ to gitignore (pytest-benchmark cache directory) - Simplify benchmark output to focus on msgspec-ext vs pydantic

- Add ClassVar annotations for class-level caches - Import ClassVar from typing - Fix docstring formatting (D212) - Remove unused _apply_defaults method - All checks pass for src/ directory Lint results: - Before: 4 errors in src/ - After: 0 errors (all checks passed) All 22 tests still passing ✅

- Add S104 (binding to all interfaces) ignore for examples/tests - Add F401 (unused imports) ignore for examples/tests - Add PLC0415 (top-level import) ignore for tests - Run ruff format on all files (3 files reformatted) - Update pyproject.toml per-file-ignores All checks now pass ✅ All 22 tests passing ✅

vilsonrodrigues · 2025-11-27T05:06:35Z

/merge

github-actions · 2025-11-27T05:06:49Z

✅ PR merged successfully by @vilsonrodrigues!

vilsonrodrigues and others added 8 commits November 27, 2025 00:11

refactor: Remove unused json import

6ad8c17

- Remove import json (not needed, using msgspec for everything) - Keep imports clean and minimal

github-actions bot merged commit 5fa229d into msgflux:main Nov 27, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize settings loading with bulk JSON decoding (3.8x faster)#12

perf: Optimize settings loading with bulk JSON decoding (3.8x faster)#12
github-actions[bot] merged 8 commits intomsgflux:mainfrom
vilsonrodrigues:feat/optimize-with-struct

vilsonrodrigues commented Nov 27, 2025

Uh oh!

vilsonrodrigues commented Nov 27, 2025

Uh oh!

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vilsonrodrigues commented Nov 27, 2025

Summary

Performance Results

Key Optimizations

Architecture

Testing

Bug Fixes

Optional Type Detection

Field Ordering

Changes Summary

No Breaking Changes

Uh oh!

vilsonrodrigues commented Nov 27, 2025

Uh oh!

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant